This page last changed on Aug 24, 2008 by iank@bearcave.com.

This page discusses the time series segmentation algorithm developed by Eamonn Keogh and his students at UC Riverside.

A time series algorithm that is useful for trading has to produce a useful signal value on the right side. That is, the most recent data. This algorithm seems to generate its line segments using a moving windows, where the segment generated consists of the oldest time series elements. If this analysis is correct, this algorithm is not very useful for building trading models.

Another problem is that the Keogh's algorithm is in MatLab and rather obscure. For example, he uses the polyfit function. MathWorks (the makers of MatLab) publishes fairly detailed descriptions of their algorithm and this one doesn't look simple. This makes this algorithm difficult to reproduce in a language like Java.

To simulate a financial time series, Eamonn used a random walk. The MatLab code is shown below:

x=random_walk(1000,1);
bottom_up(x, 20, 1)
function data = random_walk(length_of_ts,number_of_ts)
% Creates some test data.
data = [];

for i = 1 : number_of_ts
    x = 0; 
    for j = 2 : length_of_ts
        x(j) = x(j-1) + randn;
    end;
    data = [data  x'];
end;
function segment = bottom_up(data,num_segments,forceplot)
% This function approximates a time series by a sequence of piecewise linear segments. 
% This version is uncommented, email the author for details.

left_x = [1 : 2 : size(data,1)-1];
right_x      = left_x + 1;
right_x(end) = size(data,1);
number_of_segments = length(left_x );

for i = 1 : number_of_segments 
   segment(i).lx = left_x(i);
   segment(i).rx = right_x(i);
   segment(i).mc = inf;
end;

for i = 1 : number_of_segments - 1
   coef = polyfit([segment(i).lx :segment(i+1).rx]',data(segment(i).lx :segment(i+1).rx),1);
   best = (coef(1)*( [segment(i).lx :segment(i+1).rx]' ))+coef(2);
   segment(i).mc = sum((data([segment(i).lx :segment(i+1).rx]')-best).^2);
end;

while  length(segment) > num_segments  
   [value, i ] = min([segment(:).mc]);

   if i > 1 & i < length(segment) -1								
    	coef = polyfit([segment(i).lx :segment(i+2).rx]',data(segment(i).lx :segment(i+2).rx),1);
    	best = (coef(1)*( [segment(i).lx :segment(i+2).rx]' ))+coef(2);
        segment(i).mc = sum((data([segment(i).lx :segment(i+2).rx]')-best).^2);
	 segment(i).rx = segment(i+1).rx;
    	segment(i+1) = [];
    	i = i - 1;
        coef = polyfit([segment(i).lx :segment(i+1).rx]',data(segment(i).lx :segment(i+1).rx),1);
    	best = (coef(1)*( [segment(i).lx :segment(i+1).rx]' ))+coef(2);
      segment(i).mc = sum((data([segment(i).lx :segment(i+1).rx]')-best).^2);
   elseif i == 1
	coef = polyfit([segment(i).lx :segment(i+2).rx]',data(segment(i).lx :segment(i+2).rx),1);
    	best = (coef(1)*( [segment(i).lx :segment(i+2).rx]' ))+coef(2);
    	segment(i).mc = sum((data([segment(i).lx :segment(i+2).rx]')-best).^2);
	 segment(i).rx = segment(i+1).rx;
        segment(i+1) = [];      
   else
      segment(i).rx = segment(i+1).rx;
      segment(i).mc = inf;
      segment(i+1) = [];
      i = i - 1;
      coef = polyfit([segment(i).lx :segment(i+1).rx]',data(segment(i).lx :segment(i+1).rx),1);
      best = (coef(1)*( [segment(i).lx :segment(i+1).rx]' ))+coef(2);
      segment(i).mc = sum((data([segment(i).lx :segment(i+1).rx]')-best).^2);
   end;
end;

%----------------------------------------------

residuals=[];

for i = 1 : length(segment) 
        coef = polyfit([segment(i).lx :segment(i).rx]',data(segment(i).lx :segment(i).rx),1);
    	best = (coef(1)*( [segment(i).lx :segment(i).rx]' ))+coef(2);
        residuals = [    residuals ; sum((data([segment(i).lx :segment(i).rx]')-best).^2)];
end;

if nargin > 2
    hold on;
    plot(data,'r');
end; 

    temp = [];
    for i = 1 : length(segment) 
      coef = polyfit([segment(i).lx :segment(i).rx]',data(segment(i).lx :segment(i).rx),1);
      best = (coef(1)*( [segment(i).lx :segment(i).rx]' ))+coef(2);
      segment(i).ly = best(1);
      segment(i).ry = best(end);
      segment(i).mc = [];
        if nargin > 2
            plot([segment(i).lx :segment(i).rx]', best,'b');
        end;
      temp = [temp; [segment(i).ly segment(i).ry]];
    end;

if nargin > 2
    for i = 1 : length(segment)  - 1 
        plot([segment(i).rx :segment(i+1).lx]', [ temp(i,2) temp(i+1,1)  ],'g');
    end;
end;

References

  1. Online Amnesic Approximation of Streaming Time Series T. Palpanas, M. Vlachos, E. Keogh, D. Gunopulos, W. Truppel (2004), IX Conference on Extending Database Technology
  2. An Online Algorithm for Segmenting Time Series Keogh, E., Chu, S., Hart, D. & Pazzani, M. (2001), Proceedings of IEEE International Conference on Data Mining. pp 289-296
  3. Streaming Time Series Summarization Using User-Defined Amnesic Functions by Palpanas, T. Vlachos, M. Keogh, E. Gunopulos, D., (July 2008) IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 7 pgs. 992-1006

segmenting_time_series.pdf (application/x-download)
random_walk_fit.jpg (image/jpeg)